Podcastle: collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription
نویسندگان
چکیده
This paper presents acoustic-model-training techniques for improving automatic transcription of podcasts. A typical approach for acoustic modeling is to create a task-specific corpus including hundreds (or even thousands) of hours of speech data and their accurate transcriptions. This approach, however, is impractical in podcast-transcription task because manual generation of the transcriptions of the large amounts of speech covering all the various types of podcast contents will be too costly and time consuming. To solve this problem, we introduce collaborative training of acoustic models on the basis of wisdom of crowds, i.e., the transcriptions of podcast-speech data are generated by anonymous users on our web service PodCastle. We then describe a podcast-dependent acoustic modeling system by using RSS metadata to deal with the differences of acoustic conditions in podcast speech data. From our experimental results on actual podcast speech data, the effectiveness of the proposed acoustic model training was confirmed.
منابع مشابه
PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds
This paper presents a language-model training method for improving automatic transcription of online spoken contents. Unlike previously studied LVCSR tasks such as broadcast news and lectures, large-sized task-specific corpora for training language models cannot be prepared and used in recognition because of the diversity of topics, vocabularies, and speaking styles. To overcome difficulties in...
متن کاملAutomatic Transcription for a Web 2.0 Service to Search Podcasts (INTERSPEECH 2007)
This paper describes speech recognition techniques that enable a Web 2.0 service “PodCastle” where users can search and read transcribed texts of podcasts, and correct recognition errors in those texts. Most previous speech recognizers had difficulties transcribing podcasts because podcasts include various kinds of contents recorded in different conditions and cover recent topics that tend to h...
متن کاملPodcastle: Improvements of Speech Recognition by Using Acoustic Modeling Based on Wisdom of Crowds
1 はじめに 我々は,ポッドキャストを音声認識によって 自動的にテキスト化することで,それらをユー ザが全文検索できるだけではなく,詳細な閲覧, 編集も可能なソーシャルアノテーションシステ ム「PodCastle1)2)3)」の開発,運営を行っている. ポッドキャストは実環境の多様な音声データであ り,従来の音声認識技術では高い認識率を達成す ることは難しい.そこで PodCastleでは,多数 のユーザに認識誤りを訂正 (アノテーション)す る協力をしてもらうことで,音声認識率をシス テムの運用中に向上させる枠組みを採用してい る.こうすることで,検索サービスとしての質を 向上させるだけでなく,音声認識技術の底上げを はかることも狙っている. 本研究では,上記の枠組みの一環として,PodCastleを通じて得られる集合知,すなわちユー ザによる音声認識誤りの訂正結果を活用した音 響...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملPodCastle: A Spoken Document Retrieval Service Improved by Anonymous User Contributions
In this invited paper, we introduce a public web service, PodCastle, that provides full-text searching of speech data (Japanese podcasts) on the basis of automatic speech recognition technologies. This is an instance of our research approach, Speech Recognition Research 2.0, which is aimed at providing users with a web service based on Web 2.0 so that they can experience state-of-the-art speech...
متن کامل